In this paper, we attempt to examine whether the use of change-point analysis techniques is appropriate for detecting fatigue based on data captured from wearable sensors. As such, we perform a secondary data analysis to the data generated in: Baghdadi et al., 2018. The reader should note that their raw data was preprocessed using:

  1. Kalman Filter: Used to process the raw data from sensors to: (i) estimate the spatial orientation of the body with respect to the global reference frame, and (ii) to estimate the kinematics of motion.

  2. Segmentation: The motion segments where then segmented using an algorithm that assumes the existence of two peaks in the translational acceleration of the gait cycle. This assumption was justified based on the results of Tongen and Wunderlich, 2010 as well as the authors’ preliminary analyses.

The reader can show any code chunk by clicking on the code button. We chose to make the default for the code hidden since we: (a) wanted to improve the readability of this document; and (b) assumed that the readers will not be interested in reading every code chunk.


1 Loading Data & Generating Features

The snippet below documents the list of R libraries that were used in this research. For convenience, we used the pacman package since it allows for installing/loading the needed libraries in one step.

rm(list = ls()) # clear global environment
graphics.off() # close all graphics
library(pacman) # needs to be installed first
p_load(R.matlab, plotly, extrafont, grDevices, gridExtra,
       dplyr, stringr, tidyverse, utils, reshape2,
       anomalize, MVN, forecast, fractal,
       ecp, dfphase1, 
       MALDIquant, TSclust,
       knitr, kableExtra, dplyr) 

In the snippet below, we extract the 15 “.mat” files in the GitHub repository (where we loaded the data to allow for the reproduction of our work). Note that these files were originally produced in Baghdadi et al., 2018. Then, we perform several transformation steps: (a) extracting the data for the first three columns in the matlab arrays; and (b) computing three kinematic features from the data corresponding to these columns. Due to the differences between Matlab and R, this requires two nested for loops. The outer loop increments over the number of subjects, while the inner loop increments based on the different number of rows of data for each subject. Please see the comments within the code chunk for more details.

num_subjects <- seq(1, 15)
subject_heights <- c(1.71, 1.77, 1.71, 1.59, 1.69,
                     1.63, 1.60, 1.71, 1.67, 1.78,
                     1.68, 1.55, 1.83, 1.81, 1.89)

# Initilizing a df for summary data on participants
summary_df <- data.frame(matrix(nrow = 15, ncol = 9))
colnames(summary_df) <- c("Subject.Num", "num.rows",
                          "num.cols", "mean.scaled.stride.len",
                          "sd.scaled.stride.len",
                          "mean.scaled.stride.height",
                          "sd.scaled.stride.height",
                          "mean.stride.duration",
                          "sd.stride.duartion")

for (i in 1:length(num_subjects)) {
  # Reading the .mat files from GitHub
  raw_data <- readMat(paste0("https://github.com/jqtcase/fatigue-changepoint/blob/master/Data/Raw/Subject",num_subjects[i],".mat?raw=true"))
  # Compute the number of cells, and rows in each structered matrix
  raw_data_size <- lengths(raw_data) # num of cells
  num_rows <- raw_data_size / 17 # all data had 17 cols
  # Initilizing the six lists needed for storing the data (we keep track of the top 3 for error checking)
  time_in_sec <- vector("list", length = num_rows)
  position_x <- vector("list", length = num_rows)
  position_y <- vector("list", length = num_rows)
  stride_time <- vector("list", length = num_rows)
  stride_length <- vector("list", length = num_rows)
  stride_height <- vector("list", length = num_rows)
  stride_duration <- vector("list", length = num_rows)
  
  # Following for loop is needed since R reads the structured array as a nested list. The list containing the data is called "M.i.k" and it transforms/reads the original array --> rowise. This means that our first three features (with the same timestamp) are always seperated with a distance equal to the total number of rows
  for (j in 1:num_rows) {
    position_x[[j]] <- raw_data[["M.i.k"]][[j]]
    position_y[[j]] <- raw_data[["M.i.k"]][[num_rows + j]]
    stride_time[[j]] <- raw_data[["M.i.k"]][[2 * num_rows + j]]
    dataholder <- raw_data[["M.i.k"]][[16 * num_rows + j]] # data holder for time
    # Computing the three needed kinematic features
    stride_length[[j]] <-
      range(position_x[[j]])[2] - range(position_x[[j]])[1]
    stride_height[[j]] <-
      range(position_y[[j]])[2] - range(position_y[[j]])[1]
    stride_duration[[j]] <-
      range(stride_time[[j]])[2] - range(stride_time[[j]])[1]
    time_in_sec[[j]] <- lapply(dataholder, mean)# using mean time of stride as a time stamp
  }
  
  # Scaling and creating one data frame per subject
  assign(paste0("subject", i, "_features"), 
         data.frame(time.from.start = unlist(time_in_sec), 
                    scaled.stride.len = unlist(stride_length)/subject_heights[i], 
                    scaled.stride.height = unlist(stride_height) / subject_heights[i], 
                    stride.duration = unlist(stride_duration)
                    )
         )
  
  # Creating a Summary Data Frame
  df_name <- paste0("subject", i, "_features")
  summary_df[i, 1] <- paste0("subject", i)
  summary_df[i, 2] <- get(df_name) %>% nrow()
  summary_df[i, 3] <- get(df_name) %>% ncol()
  summary_df[i, 4] <- get(df_name)[, 2] %>% mean() %>% round(digits = 4)
  summary_df[i, 5] <- get(df_name)[, 2] %>% sd() %>% round(digits = 4)
  summary_df[i, 6] <- get(df_name)[, 3] %>% mean() %>% round(digits = 4)
  summary_df[i, 7] <- get(df_name)[, 3] %>% sd() %>% round(digits = 4)
  summary_df[i, 8] <- get(df_name)[, 4] %>% mean() %>% round(digits = 4)
  summary_df[i, 9] <- get(df_name)[, 4] %>% sd() %>% round(digits = 4)
}
# Printing the top six rows of Subject 4's data as an example
head(subject4_features) %>% round(digits = 3)
# A Summary of the features for all 15 participants
summary_df
rm(raw_data, raw_data_size, i, j, num_rows, 
   dataholder, num_subjects)
save.image(file = "./Data/RGenerated/FeatureGeneration.RData")

Based on the analysis above, there are three observations to be made. First, we scaled the stride length and height based on the subject’s height. This in essence allows us to capture the steps as a percentage of the person’s height. This reduces the between subject variablity in the data and is supported by the seminal work of Oberg et al., 1993. Second, we showed the first six rows from Subject 4 to provide the reader with some insights into the sampling frequeny (after the preprocessing done in Baghdadi et al. 2018). Note that the kinematic features are computed from the sensor channels provided in their paper. Third, we saved the generated data into an R.data file, which can be accessed by clicking on: FeatureGeneration.RData. We hope that this saved data allows other researchers to reproduce and/or build on our work.


1.1 Plotting the Data

First, we use a standard line plot to depict the data for each feature by person. For the sake of facilitating the visualization process, we: (a) panel the plot such that the left, center and right panels correspond to the scaled stride length, scaled stride height and step duration, respectively; and (b) we save the results of each participant (P) in a different tab.

# We are reading the data locally (you can download the file from GitHub)
load(file = "./Data/RGenerated/FeatureGeneration.RData")
for (i in 1:15) {
  df_transformed <- melt(get(paste0("subject",i,"_features")),
                         id.vars = "time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                                        )
                         )
  assign(paste0("p",i),
         ggplot(data = df_transformed,
                aes(x=time.from.start,y=value,group=variable,color=variable,
                    shape=variable)) + 
           geom_line() + theme_bw() + 
           #ylim(0,2) +
           ggtitle(paste0("Participant ",i,": Pre-filtered Data")) + 
           theme(legend.position="none") + 
           facet_wrap(~variable,nrow=3, scales = "free_y")
         )
  cat('###',paste0("P", i), "{-}",'\n')
  print(get(paste0("p",i)))
  cat('\n \n')
  
  stat_printdf <- get(paste0("subject",i,"_features"))
  
 cat(paste0('<source> <p> Based on the <b>raw data</b>, Participant ', i,
             ' has an average scaled stride length of ',
             round(mean(stat_printdf[,2]),2),
             ', an average scaled stride height of ',
             round(mean(stat_printdf[,3]),2),
             ' and an average stride duration of ', round(mean(stat_printdf[,4]),2),
             ' seconds. Based on their height of ', subject_heights[i],
             ' meters, their average stride length and stride height are', ' ',
             round(mean(stat_printdf[,2])*subject_heights[i],2),
             ' and ', ' ', round(mean(stat_printdf[,3])*subject_heights[i],2),
             ', respectively.', '</p> </source>'
             )
      )
cat(' \n \n')
}

P1

Based on the raw data, Participant 1 has an average scaled stride length of 1.07, an average scaled stride height of 0.09 and an average stride duration of 1.06 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.82 and 0.16, respectively.

P2

Based on the raw data, Participant 2 has an average scaled stride length of 1.63, an average scaled stride height of 0.13 and an average stride duration of 0.95 seconds. Based on their height of 1.77 meters, their average stride length and stride height are 2.88 and 0.23, respectively.

P3

Based on the raw data, Participant 3 has an average scaled stride length of 0.94, an average scaled stride height of 0.09 and an average stride duration of 1.09 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.61 and 0.16, respectively.

P4

Based on the raw data, Participant 4 has an average scaled stride length of 1.28, an average scaled stride height of 0.12 and an average stride duration of 0.95 seconds. Based on their height of 1.59 meters, their average stride length and stride height are 2.03 and 0.2, respectively.

P5

Based on the raw data, Participant 5 has an average scaled stride length of 0.96, an average scaled stride height of 0.09 and an average stride duration of 1.09 seconds. Based on their height of 1.69 meters, their average stride length and stride height are 1.63 and 0.15, respectively.

P6

Based on the raw data, Participant 6 has an average scaled stride length of 0.91, an average scaled stride height of 0.11 and an average stride duration of 1.06 seconds. Based on their height of 1.63 meters, their average stride length and stride height are 1.48 and 0.18, respectively.

P7

Based on the raw data, Participant 7 has an average scaled stride length of 0.84, an average scaled stride height of 0.09 and an average stride duration of 1.12 seconds. Based on their height of 1.6 meters, their average stride length and stride height are 1.34 and 0.15, respectively.

P8

Based on the raw data, Participant 8 has an average scaled stride length of 1.53, an average scaled stride height of 0.14 and an average stride duration of 0.98 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 2.62 and 0.24, respectively.

P9

Based on the raw data, Participant 9 has an average scaled stride length of 0.87, an average scaled stride height of 0.1 and an average stride duration of 1.11 seconds. Based on their height of 1.67 meters, their average stride length and stride height are 1.46 and 0.17, respectively.

P10

Based on the raw data, Participant 10 has an average scaled stride length of 1.28, an average scaled stride height of 0.11 and an average stride duration of 1.01 seconds. Based on their height of 1.78 meters, their average stride length and stride height are 2.27 and 0.2, respectively.

P11

Based on the raw data, Participant 11 has an average scaled stride length of 1.2, an average scaled stride height of 0.09 and an average stride duration of 1.05 seconds. Based on their height of 1.68 meters, their average stride length and stride height are 2.02 and 0.15, respectively.

P12

Based on the raw data, Participant 12 has an average scaled stride length of 1.61, an average scaled stride height of 0.14 and an average stride duration of 0.89 seconds. Based on their height of 1.55 meters, their average stride length and stride height are 2.5 and 0.21, respectively.

P13

Based on the raw data, Participant 13 has an average scaled stride length of 0.67, an average scaled stride height of 0.07 and an average stride duration of 1.14 seconds. Based on their height of 1.83 meters, their average stride length and stride height are 1.23 and 0.13, respectively.

P14

Based on the raw data, Participant 14 has an average scaled stride length of 1.32, an average scaled stride height of 0.14 and an average stride duration of 0.93 seconds. Based on their height of 1.81 meters, their average stride length and stride height are 2.39 and 0.25, respectively.

P15

Based on the raw data, Participant 15 has an average scaled stride length of 1.18, an average scaled stride height of 0.1 and an average stride duration of 1.04 seconds. Based on their height of 1.89 meters, their average stride length and stride height are 2.24 and 0.2, respectively.

2 Implementation of the Filtering and Data Preprocessing

In this task, we examined several approaches for outlier detection. We originally hypothesized that these outliers constitue faulty sensor readings (since they are not sustained), and thus, we can remove this data prior to any further analysis. However, based on a detailed investigation of the data and the application of several outlier detection methods (univariate and multivariate), we have learned that the data is non-normal, highly autocorrelated and non-stationary. Thus, these approaches are not suitable for removing the outliers from our data. Thus, we then investiaged the utilization of “filtering” based methods in Section 3 to impute/correct/smoothen the data.

After examining several outlier detection methods, i.e., boxplots for outlier detection and Mahalanobis distance based methods, our final approach to detecting and then removing outliers is to use smoothing techniques to “correct” the data. Based on the results shown in the previous sections, the fundamental hypotheses behind this approach are:

  1. The sensors’ data seem to be non-stationary, autocorrelated and somewhat complex (from a statistical perspective). More importantly, there are spikes in the data that cannot be explained by the kinematic model. These spikes produce values that are unrealistic (e.g., a scaled stride length > 2 is not possible since that means that the stride of the person is two times larger than their height). While this insight is useful, there is no hard limit that we can enforce since the literature reports mean values across the population instead of attempting to provide a physical threshold on what is possible.

  2. The outliers tended to appear in pairs of two; indicating a possible issue in the segmentation of the data. Note that we label this data as outliers since it is not sustained (i.e., it is unlikely to say that a person is fatigued for only two stride).

  3. Smoothing (i.e. a low pass filter on the data) will allow us to impute values for every stride (after an initial size window). This will allow us to preserve the successive nature of the strides, “fix” the outliers based on neighboring strides, and not throw away any observation.

2.1 The Median Filtering Approach from the Signal Processing Literature

An algorithmic approach based on these observations is implemented below. Note that we are using the median filter from the fractal package on CRAN. In the analysis below, we are applying the median Filter, with n=29 (i.e., the median is calculated based on a overlaping moving window of size 29 strides). We chose this window size based on the recommendation of our biomechanics expert as it allows us to: (a) ignore the effect of any turns in the direction of motion; and (b) the corresponding window size is approximately 30 seconds long (which presents an easy to remember rule for practical application). The data processed with the median filtered approach is stored at the following location in our GitHub Repository. The median filter was also applied on the CUSUM of the features and the results from this analysis can be found at: CusumData.RData.

# We are reading the data locally (you can download the file from GitHub)
load(file = "./Data/RGenerated/FeatureGeneration.RData")
for (i in 1:15) {
  df <- get(paste0("subject",i,"_features")) # Getting Data from Sec.
  # Obtaining the Outliers for each of the three variables
  time.from.start <- df[,1]
  scaled.stride.len <- medianFilter(df[,2], order = 29) # moving window of size 29 obs
  scaled.stride.height <- medianFilter(df[,3], order = 29) # moving window of size 29 obs
  stride.duration <- medianFilter(df[,4], order = 29) # moving window of size 29 obs
 
   # Remove the observations corresponding to the outliers
  assign(paste0("subject",i,"_medianf"), 
         data.frame(cbind(time.from.start, scaled.stride.len, 
                          scaled.stride.height, stride.duration)))
  
  # Preparing the data for the Line Graph
  df_transformed <- melt(get(paste0("subject",i,"_medianf")),
                         id.vars = "time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                                        )
                         ) # ggplot data needs to be tall
  
  assign(paste0("g",i),
         ggplot(data = df_transformed,
                aes(x=time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle("Data post the application of the median filter") + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y")
         )
  
  
  cat('###',paste0("P", i), "{-}",'\n')
  
  print(get(paste0("g",i)))
  
  cat('\n')
  
  cat(paste0('<source> <p> Based on the <b>median filter</b>, Participant ', i,
             ' has an average scaled stride length of ',
             round(mean(scaled.stride.len),2),
             ', an average scaled stride height of ',
             round(mean(scaled.stride.height),2),
             ' and an average stride duration of ', round(mean(stride.duration),2),
             ' seconds. Based on their height of ', subject_heights[i],
             ' meters, their average stride length and stride height are', ' ',
             round(mean(scaled.stride.len)*subject_heights[i],2),
             ' and ', ' ', round(mean(scaled.stride.height)*subject_heights[i],2),
             ', respectively.', '</p> </source>'
             )
      )
    
  cat(' \n \n')

}

P1

Based on the median filter, Participant 1 has an average scaled stride length of 1.08, an average scaled stride height of 0.09 and an average stride duration of 1.06 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.84 and 0.16, respectively.

P2

Based on the median filter, Participant 2 has an average scaled stride length of 1.65, an average scaled stride height of 0.13 and an average stride duration of 0.94 seconds. Based on their height of 1.77 meters, their average stride length and stride height are 2.91 and 0.24, respectively.

P3

Based on the median filter, Participant 3 has an average scaled stride length of 0.94, an average scaled stride height of 0.09 and an average stride duration of 1.08 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.61 and 0.15, respectively.

P4

Based on the median filter, Participant 4 has an average scaled stride length of 1.31, an average scaled stride height of 0.13 and an average stride duration of 0.93 seconds. Based on their height of 1.59 meters, their average stride length and stride height are 2.09 and 0.2, respectively.

P5

Based on the median filter, Participant 5 has an average scaled stride length of 0.98, an average scaled stride height of 0.09 and an average stride duration of 1.09 seconds. Based on their height of 1.69 meters, their average stride length and stride height are 1.66 and 0.15, respectively.

P6

Based on the median filter, Participant 6 has an average scaled stride length of 0.92, an average scaled stride height of 0.11 and an average stride duration of 1.05 seconds. Based on their height of 1.63 meters, their average stride length and stride height are 1.5 and 0.17, respectively.

P7

Based on the median filter, Participant 7 has an average scaled stride length of 0.85, an average scaled stride height of 0.09 and an average stride duration of 1.13 seconds. Based on their height of 1.6 meters, their average stride length and stride height are 1.36 and 0.15, respectively.

P8

Based on the median filter, Participant 8 has an average scaled stride length of 1.59, an average scaled stride height of 0.14 and an average stride duration of 0.96 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 2.72 and 0.24, respectively.

P9

Based on the median filter, Participant 9 has an average scaled stride length of 0.87, an average scaled stride height of 0.1 and an average stride duration of 1.11 seconds. Based on their height of 1.67 meters, their average stride length and stride height are 1.46 and 0.17, respectively.

P10

Based on the median filter, Participant 10 has an average scaled stride length of 1.34, an average scaled stride height of 0.11 and an average stride duration of 0.99 seconds. Based on their height of 1.78 meters, their average stride length and stride height are 2.38 and 0.2, respectively.

P11

Based on the median filter, Participant 11 has an average scaled stride length of 1.22, an average scaled stride height of 0.09 and an average stride duration of 1.05 seconds. Based on their height of 1.68 meters, their average stride length and stride height are 2.04 and 0.15, respectively.

P12

Based on the median filter, Participant 12 has an average scaled stride length of 1.64, an average scaled stride height of 0.14 and an average stride duration of 0.88 seconds. Based on their height of 1.55 meters, their average stride length and stride height are 2.55 and 0.22, respectively.

P13

Based on the median filter, Participant 13 has an average scaled stride length of 0.68, an average scaled stride height of 0.07 and an average stride duration of 1.15 seconds. Based on their height of 1.83 meters, their average stride length and stride height are 1.24 and 0.13, respectively.

P14

Based on the median filter, Participant 14 has an average scaled stride length of 1.37, an average scaled stride height of 0.14 and an average stride duration of 0.92 seconds. Based on their height of 1.81 meters, their average stride length and stride height are 2.48 and 0.25, respectively.

P15

Based on the median filter, Participant 15 has an average scaled stride length of 1.22, an average scaled stride height of 0.1 and an average stride duration of 1.03 seconds. Based on their height of 1.89 meters, their average stride length and stride height are 2.3 and 0.19, respectively.

# Saving a list of cleaned 
save(subject1_medianf, subject2_medianf, subject3_medianf, subject4_medianf,
     subject5_medianf, subject6_medianf, subject7_medianf, subject8_medianf,
     subject9_medianf, subject10_medianf, subject11_medianf, subject12_medianf,
     subject13_medianf, subject14_medianf, subject15_medianf,
     file = "./Data/RGenerated/MedianFilteredData.RData")

###################################################################################
# Applying the Median Filter Methodology on the CUSUM of the Features

load(file = "./Data/RGenerated/MedianFilteredData.RData")
  for (i in 1:15) {
    df <- get(paste0("subject",i,"_medianf"))
  means <- colMeans(df)
  df[,2] <- cumsum(df[,2]-means[2])
  df[,3] <- cumsum(df[,3]-means[3])
  df[,4] <- cumsum(df[,4]-means[4])
  assign(paste0("subject",i,"_cusum"), df)  
    
  }

# Saving a list of cleaned 
save(subject1_cusum, subject2_cusum, subject3_cusum,
     subject4_cusum, subject5_cusum, subject6_cusum,
     subject7_cusum, subject8_cusum, subject9_cusum,
     subject10_cusum, subject11_cusum, subject12_cusum,
     subject13_cusum, subject14_cusum, subject15_cusum,
       file = "./Data/RGenerated/CusumData.RData")

2.2 Using %Time-from-Start to Standardize the Length of Each Participant’s Time-series

In this section, we rescale/normalize the timescale a 0-100% scale for all participants to prepare for a timeseries clustering type analysis. Specifically, we sampled the three features from the data based on a 0.05% from start. This resulted in a standardization of the number of observations within each timeseries across our 15 paricipants. Based on our increment size, each timeseries is now comprised of 2000 observations, each at 0.05% time increment from its neighouring data points. The rest of our analysis will be performed on this standardized data. The normalized data can be found at: NormMedianFilteredData.RData and NormCusumData.RData

# to load previously generated data 
load(file = "./Data/RGenerated/FeatureGeneration.RData")

medianfilt_norm <- list() 
cusum_norm <- list()

for (i in 1:15) {
  
  df <- get(paste0("subject",i,"_features")) # Getting Data from Sec.
  
  # resampling the data to 2000 samples
  nsamples = 2000
  timePerc <- seq(0, 100, by=0.05)
  timeScale <- df[nrow(df),1]/nsamples
  df_norm = df[FALSE,]
  
  for (j in 1:nsamples) {
    df_matchIsx <- match.closest(timeScale*(j-0.5),df[,1]) # finding the closest point
    df_keep <- df[df_matchIsx,]
    df_keep[,1] <- timePerc[j]
    df_norm <- rbind(df_norm, df_keep)
  }
  assign(paste0("subject",i,"_features_norm"), df_norm)
  
  # median filter  
  df <- get(paste0("subject",i,"_features_norm")) # Getting Data from Sec.
  # Obtaining the Outliers for each of the three variables
  percent.time.from.start <- df[,1]
  scaled.stride.len <- medianFilter(df[,2], order = 29) # moving window of size 29 obs
  scaled.stride.height <- medianFilter(df[,3], order = 29) # moving window of size 29 obs
  stride.duration <- medianFilter(df[,4], order = 29) # moving window of size 29 obs
  assign(paste0("subject",i,"_medianf_norm"), 
         data.frame(cbind(percent.time.from.start, scaled.stride.len, 
                          scaled.stride.height, stride.duration)))
  medianfilt_norm[[i]] <- get(paste0("subject",i,"_medianf_norm"))

  
  # cusum calculation
  df <- get(paste0("subject",i,"_medianf_norm"))
  means <- colMeans(df)
  df[,2] <- cumsum(df[,2]-means[2])
  df[,3] <- cumsum(df[,3]-means[3])
  df[,4] <- cumsum(df[,4]-means[4])
  assign(paste0("subject",i,"_cusum_norm"), df)  
  cusum_norm[[i]] <- get(paste0("subject",i,"_cusum_norm"))
  
}


save(subject1_medianf_norm, subject2_medianf_norm, subject3_medianf_norm, subject4_medianf_norm,
     subject5_medianf_norm, subject6_medianf_norm, subject7_medianf_norm, subject8_medianf_norm,
     subject9_medianf_norm, subject10_medianf_norm, subject11_medianf_norm, subject12_medianf_norm,
     subject13_medianf_norm, subject14_medianf_norm, subject15_medianf_norm,
     file = "./Data/RGenerated/NormMedianFilteredData.RData")  

save(subject1_cusum_norm, subject2_cusum_norm, subject3_cusum_norm,
     subject4_cusum_norm, subject5_cusum_norm, subject6_cusum_norm,
     subject7_cusum_norm, subject8_cusum_norm, subject9_cusum_norm,
     subject10_cusum_norm, subject11_cusum_norm, subject12_cusum_norm,
     subject13_cusum_norm, subject14_cusum_norm, subject15_cusum_norm,
       file = "./Data/RGenerated/NormCusumData.RData")

3 Changepoint Detection

A common approach for changepoint detection is to project the multivariate problem into a univariate-space using principal component analysis. Then, apply a univariate changepoint detection approach on the problem. We examined this approach and found the following:

  1. The e.divise() produces a lot of changepoints with the deafult sig.level;
  2. Number of changepoints is still large with sig.lvl = 0.001;

For this reason, we do not think that this approach is promising. We will move on to the multivariate approach.

3.1 The Nonparameteric Approach of Matterson & James (ECP)

In this section, we will examine the use of the approach of Matteson and James (2014) for multiple change point analysis of our trivariate data. Our analysis takes advantage of their R package titled ecp, which is described in more details in the following paper: James and Matteson (2014). The multivariate changepoint method will be first tested using the median filtered data and then using the CUSUM of the median filtered data.

3.1.1 Using the Median Filtered Data

To apply the ecp approach of Matteson and James (2014) on the median filtered data, we need to use a window size (which reflects the smallest size of a shift that is to be detected). From a biomechanics perspective, we expect that the change of performance due to fatigue will be sustained at least for several minutes (until the person adapts and/or rests). Thus, the size of the window (\(\small m_{ecp}\)) should be \(\small m_{ecp} \ge 60\). In our analysis, we tried the following window sizes: 60, 120, 180, 300, and 600. For most of the participants, the number and/or location of change were almost similar. For demonstration purpose, we use \(\small m_{ecp} = 120\) below.

  load(file = "./Data/RGenerated/NormMedianFilteredData.RData")
  pen <- function(x) -length(x) #Equally penalizes every additional changepoint
  window.size <- 120
  changepoints_medianf <- list()

  for (i in 1:15) {
      df <- get(paste0("subject",i,"_medianf_norm"))
      # Making the DF divisiable by the window size
      # (i.e., we are removing the remainder so that our chunks are equally sized)
      rows.to.keep <- nrow(df[,2:4]) %/% window.size
      remainder <- nrow(df[,2:4]) %% window.size
      df.MultivariateChangepoint<- df[1:(nrow(df)-remainder),
                                      2:4] %>% as.matrix()
      
      # Utilizing the e.agglo() from the ecp package
      mem <- rep(1:rows.to.keep, each=window.size)
      y = e.agglo(X=df.MultivariateChangepoint,
                  member= mem,
                  alpha = 1,
                  penalty = pen)
      fatigue_from_ECP <- y$estimates # Returns observation number
      changepoints_medianf[[i]] <- df[fatigue_from_ECP,1]
      
      cat('####',paste0("P", i), "{-}",'\n')
      df_transformed <- melt(get(paste0("subject",i,"_medianf_norm")),
                         id.vars = "percent.time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                                        )
                         ) # ggplot data needs to be tall
      
      assign(paste0("g",i),
         ggplot(data = df_transformed,
                aes(x=percent.time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle("Changepoints for the Median Filtered Data (at vertical black lines)") + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y") +
           geom_vline(xintercept= changepoints_medianf[[i]])
         )
      
      print(get(paste0("g",i)))
  
  cat('\n')
  
  cat(paste0('<source> <p> Based on the <b>ecp package</b>, the number of changepoints for participant ', i,
             ' is equal to: ',
             length(changepoints_medianf[[i]]),
             '. These changepoints are located at: ',
             paste(round(changepoints_medianf[[i]]), collapse = ", "),
             '. The results from the analysis above can be found at: <a href="https://github.com/jqtcase/fatigue-changepoint/blob/master/Data/RGenerated/ECPChangePointsMFData.RData">ECPChangePointsMFData.RData</a>. </p> </source>')
      )
  cat('\n \n')
  }

P1

Based on the ecp package, the number of changepoints for participant 1 is equal to: 2. These changepoints are located at: 60, 78. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P2

Based on the ecp package, the number of changepoints for participant 2 is equal to: 3. These changepoints are located at: 0, 66, 96. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P3

Based on the ecp package, the number of changepoints for participant 3 is equal to: 2. These changepoints are located at: 18, 90. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P4

Based on the ecp package, the number of changepoints for participant 4 is equal to: 3. These changepoints are located at: 0, 60, 96. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P5

Based on the ecp package, the number of changepoints for participant 5 is equal to: 3. These changepoints are located at: 0, 42, 96. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P6

Based on the ecp package, the number of changepoints for participant 6 is equal to: 3. These changepoints are located at: 0, 54, 96. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P7

Based on the ecp package, the number of changepoints for participant 7 is equal to: 2. These changepoints are located at: 0, 96. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P8

Based on the ecp package, the number of changepoints for participant 8 is equal to: 2. These changepoints are located at: 18, 54. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P9

Based on the ecp package, the number of changepoints for participant 9 is equal to: 1. These changepoints are located at: 6. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P10

Based on the ecp package, the number of changepoints for participant 10 is equal to: 2. These changepoints are located at: 60, 90. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P11

Based on the ecp package, the number of changepoints for participant 11 is equal to: 3. These changepoints are located at: 0, 18, 96. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P12

Based on the ecp package, the number of changepoints for participant 12 is equal to: 2. These changepoints are located at: 54, 66. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P13

Based on the ecp package, the number of changepoints for participant 13 is equal to: 1. These changepoints are located at: 6. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P14

Based on the ecp package, the number of changepoints for participant 14 is equal to: 4. These changepoints are located at: 0, 24, 42, 96. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P15

Based on the ecp package, the number of changepoints for participant 15 is equal to: 2. These changepoints are located at: 6, 30. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

save(changepoints_medianf,
       file = "./Data/RGenerated/ECPChangePointsMFData.RData")

3.1.2 Using the CUSUM of the Median Filtered Data

Here, we apply the ecp approach of Matterson and James (2014) on the cusums of the median filtered data (as opposed to the median filtered timeseries). For demonstration purpose, we use \(\small m_{ecp} = 120\) below.

load(file = "./Data/RGenerated/NormCusumData.RData")
  pen <- function(x) -length(x) #Equally penalizes every additional changepoint
  window.size <- 120
  changepoints_cusum <- list()

  for (i in 1:15) {
      df <- get(paste0("subject",i,"_cusum_norm"))
      # Making the DF divisiable by the window size
      # (i.e., we are removing the remainder so that our chunks are equally sized)
      rows.to.keep <- nrow(df[,2:4]) %/% window.size
      remainder <- nrow(df[,2:4]) %% window.size
      df.MultivariateChangepoint<- df[1:(nrow(df)-remainder),
                                      2:4] %>% as.matrix()
      
      # Utilizing the e.agglo() from the ecp package
      mem <- rep(1:rows.to.keep, each=window.size)
      y = e.agglo(X=df.MultivariateChangepoint,
                  member= mem,
                  alpha = 1,
                  penalty = pen)
      fatigue_from_CUSUM_ECP <- y$estimates
      changepoints_cusum[[i]] <- df[fatigue_from_CUSUM_ECP,1]
      
      cat('####',paste0("P", i), "{-}",'\n')
      df_transformed <- melt(get(paste0("subject",i,"_cusum_norm")),
                         id.vars = "percent.time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                                        )
                         ) # ggplot data needs to be tall
      
      assign(paste0("g",i),
         ggplot(data = df_transformed,
                aes(x=percent.time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle("Changepoints for the CUSUMs of the Median Filtered Data (vertical black lines)") + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y") +
           geom_vline(xintercept= changepoints_cusum[[i]])
         )
      
      print(get(paste0("g",i)))
  
  cat('\n')
  
  cat(paste0('<source> <p> Based on the <b>ecp package</b>, the number of changepoints for the cusums of the median filtered data for participant ', i,
             ' is equal to: ',
             length(changepoints_cusum[[i]]),
             '. These changepoints are located at: ',
             paste(round(changepoints_cusum[[i]]), collapse = ", "),
             '. The results from the analysis above can be found at: <a href="https://github.com/jqtcase/fatigue-changepoint/blob/master/Data/RGenerated/ECPChangePointsCUSUM.RData">ECPChangePointsCUSUM.RData</a>. </p> </source>')
      )
  cat('\n \n')
  }

P1

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 1 is equal to: 2. These changepoints are located at: 24, 72. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P2

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 2 is equal to: 2. These changepoints are located at: 42, 78. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P3

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 3 is equal to: 4. These changepoints are located at: 12, 36, 72, 84. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P4

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 4 is equal to: 5. These changepoints are located at: 0, 6, 54, 66, 96. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P5

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 5 is equal to: 2. These changepoints are located at: 6, 48. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P6

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 6 is equal to: 7. These changepoints are located at: 0, 24, 30, 48, 60, 84, 96. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P7

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 7 is equal to: 4. These changepoints are located at: 0, 24, 72, 96. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P8

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 8 is equal to: 3. These changepoints are located at: 0, 42, 96. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P9

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 9 is equal to: 3. These changepoints are located at: 36, 60, 90. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P10

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 10 is equal to: 5. These changepoints are located at: 0, 6, 42, 66, 96. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P11

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 11 is equal to: 2. These changepoints are located at: 6, 84. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P12

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 12 is equal to: 2. These changepoints are located at: 12, 60. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P13

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 13 is equal to: 2. These changepoints are located at: 6, 66. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P14

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 14 is equal to: 3. These changepoints are located at: 6, 30, 54. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P15

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 15 is equal to: 2. These changepoints are located at: 24, 72. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

save(changepoints_cusum,
       file = "./Data/RGenerated/ECPChangePointsCUSUM.RData")

3.2 Comments on the the Change Point Detection Method

To compare the performance of the ECP approach, we also investigated using the approach of Cappizi and Masarotto (2016) for another nonparameteric multivariate changepoint analysis method (dfphase1). Similar to the ecp approach, we applied their methodology on the: (a) median filtered data; and (b) cusums of the median filtered data with the similar window size, i.e. 120 strides (\(\small m_{dfphase1}=120\)). Note that we did not perform the analysis on the differences since we concluded that it was less informative than (a) and (b).

For the parameterization of the Cappizi and Masarotto (2016) approach, we used the default parameters of the mphase1() function from the dfphase1 package with the exception of:
– isolated = FALSE since we do not expect shifts to last only 120 strides.

Based on our results, the approach of Matterson and James (2016) is preferred since it:

  1. Results in a more limited number of changepoints, highlighting more (practically) significant changes in the gait parameters. This result was somewhat surprising since we used the default settings for the approach of Capizzi and Masarotto (2016) throught the default mphase1(), which prevents having changepoints within 5 windows. Note that this result holds true even when we adjusted the \(\small \alpha =0.001\) instead of the default of \(\small \alpha =0.05\):) for the mphase1() method (without adjusting the ecp() method since it does not allow for adjusting the false alarm rate); and

  2. Is more computationally efficient. It had a shorter run time than the other approach.

4 Comparing the ECP Changepoints with Participants’ Subjective Ratings

Throughout the experiments, participants provided their subjective ratings every 10 minutes. The ratings were based on the Borg 6-20 RPE scale. Recent literature on fatigue detection suggests that a rating of 13 can be used to estimate the onset of fatigue (see Maman et al. (2017) for a detailed analysis). The subjective ratings for each participant can be accessed at: subjective-ratings-raw.

4.1 Plots with the Median Filtered Data and their ECP Changepoints

In this section, we perform the following tasks:

  1. EDA, where we overlay the subjective ratings with the Median Filtered changepoints;

  2. A threshold based comparison, where we use \(\small RPE \ge 13\) to identify when a participant becomes fatigued and we compare that to the changepoints detected by the hybrid CUSUM-EPC approach; and

  3. Utilize the e.divisive() from the ecp package to obtain the changepoints for the Borg Ratings. Note that this function is the univariate approach for the multiple changepoint method examined in Section 3.1.

RatingThresh <- 13  # based on Maman et al. 2017 
RPE = read.csv("https://raw.githubusercontent.com/jqtcase/fatigue-changepoint/master/Data/Raw/RPE.csv") # to read subjective rating of RPE data from GitHub'
colnames(RPE)[1] <- "percent.time.from.start"
df_RPE_transformed <- melt(RPE,
                           id.vars = "percent.time.from.start",
                           measure.vars = colnames(RPE)[2:16])
df_RPE_transformed[,2] <- paste0(df_RPE_transformed[,2],".RPE")
# to load previously generated data and changepoints 
load(file = "./Data/RGenerated/NormMedianFilteredData.RData")
load(file = "./Data/RGenerated/ECPChangePointsMFData.RData")
changepoints_subj_ecp <- list()
changepoints_subj_thr <- list()

for (i in 1:15) {
  correct_subject_rows <- which(df_RPE_transformed[,2]==paste0("Subject.",i,".RPE"))
  dataholder <- df_RPE_transformed[correct_subject_rows,]
  
  
  # -------- Calculation of Fatigue based on RPE>= 13 -----------------
  fatigue_from_RPE_threshold <- which(RPE[,i+1]>=RatingThresh) %>% 
                              min() # returns window number
  changepoints_subj_thr[[i]] <- RPE[fatigue_from_RPE_threshold,1]
  df_RPE<-data.frame(d1=RPE[1],t1=RPE[i+1]) %>% na.omit() # to make dataframe of subjective ratings for ggplot
  
  #------- Estimation of Fatigue Changepoints based on ECP of the RPE Data --
  y <- e.divisive(df_RPE,min.size = 2)$estimates
  fatigue_from_ecp_changepoint = y[2:(length(y)-1)] # removing the first and last points as trivials changepoints
  changepoints_subj_ecp[[i]] <- df_RPE[fatigue_from_ecp_changepoint,1]
  
  
  cat('###',paste0("P", i), "{-}",'\n')
  df_transformed <- melt(get(paste0("subject",i,"_medianf_norm")),
                       id.vars = "percent.time.from.start",
                       measure.vars=c("scaled.stride.len",
                                      "scaled.stride.height",
                                      "stride.duration"
                                      )
                         )
  
  df_transformed <- rbind(df_transformed,df_RPE_transformed[correct_subject_rows,])
      
  assign(paste0("g",i),
        ggplot(data = df_transformed,
               aes(x=percent.time.from.start, y=value, colour = variable)) + 
          geom_line(size=1) +
          theme_bw() +
          scale_colour_manual(values = c("red", "green", "blue","black","grey40")) +
          facet_wrap(~variable,nrow=5, scales = "free_y") +
          # labs(title = paste("A paneled time-series plot for Participant",i)) + 
          geom_vline(xintercept= changepoints_medianf[[i]], 
                     show.legend = T, size=2) + 
          geom_vline(xintercept= changepoints_subj_thr[[i]],
                     linetype="solid", color="steelblue1",
                     show.legend = T, size=2) +
          geom_vline(xintercept= changepoints_subj_ecp[[i]],
                     linetype="dotted", color="gray80",
                     show.legend =T, size=2) +
          theme(legend.position="none",
                 plot.title = element_text(hjust = 0.5))
      )
      
      print(get(paste0("g",i)))
  
  cat('\n \n')
  
  cat(paste0('<source> <p> In the above figure, the colored vertical lines correspond to the following.  First, the <b> black  lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. </b> These changepoints are what we obtained in Section 3.1.1. Second, the <font color= #63B8FF> <b>steelblue line  correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. </b> </font> Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the <b> <font color=#CCCCCC> gray lines  denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. </font> </b>',
'.  </font> </b> \n \n'))
  
  cat(paste0('<source> <p> Based on the <b>ecp package, the number of changepoints for the the median filtered data for participant ', i,
             ' is equal to: ',
             length(changepoints_medianf[[i]]),
             '. These changepoints are located at: ',
             paste(round(changepoints_medianf[[i]]), collapse = ", "),
             '.  <font color= #63B8FF> The threshold based changepoint in the subjective RPE is located at: ',
             paste(changepoints_subj_thr[[i]]),
             '. </font> </b> In addition, <b> <font color=#CCCCCC> the number of changepoints based on the RPE values for this subject is equal to: ',
             length(changepoints_subj_ecp[[i]]),
             ". These changepoints are located at: ",
             paste(changepoints_subj_ecp[[i]], collapse = ", "),
             '. </b> </font> The results from the analysis above can be found at: <a href="https://github.com/jqtcase/fatigue-changepoint/blob/master/Data/RGenerated/ChangepointsRPE.RData">ChangepointsRPE.RData</a>. </p> </source>')
      )
  
  cat('\n \n')
  
  }

P1

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 1 is equal to: 2. These changepoints are located at: 60, 78. The threshold based changepoint in the subjective RPE is located at: 59. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. The results from the analysis above can be found at: ChangepointsRPE.RData.

P2

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 2 is equal to: 3. These changepoints are located at: 0, 66, 96. The threshold based changepoint in the subjective RPE is located at: 53. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. The results from the analysis above can be found at: ChangepointsRPE.RData.

P3

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 3 is equal to: 2. These changepoints are located at: 18, 90. The threshold based changepoint in the subjective RPE is located at: 35. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 47. The results from the analysis above can be found at: ChangepointsRPE.RData.

P4

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 4 is equal to: 3. These changepoints are located at: 0, 60, 96. The threshold based changepoint in the subjective RPE is located at: 23. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 41, 65, 88. The results from the analysis above can be found at: ChangepointsRPE.RData.

P5

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 5 is equal to: 3. These changepoints are located at: 0, 42, 96. The threshold based changepoint in the subjective RPE is located at: 23. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. The results from the analysis above can be found at: ChangepointsRPE.RData.

P6

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 6 is equal to: 3. These changepoints are located at: 0, 54, 96. The threshold based changepoint in the subjective RPE is located at: 76. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. The results from the analysis above can be found at: ChangepointsRPE.RData.

P7

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 7 is equal to: 2. These changepoints are located at: 0, 96. The threshold based changepoint in the subjective RPE is located at: 18. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. The results from the analysis above can be found at: ChangepointsRPE.RData.

P8

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 8 is equal to: 2. These changepoints are located at: 18, 54. The threshold based changepoint in the subjective RPE is located at: 59. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 41. The results from the analysis above can be found at: ChangepointsRPE.RData.

P9

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 9 is equal to: 1. These changepoints are located at: 6. The threshold based changepoint in the subjective RPE is located at: 23. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. The results from the analysis above can be found at: ChangepointsRPE.RData.

P10

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 10 is equal to: 2. These changepoints are located at: 60, 90. The threshold based changepoint in the subjective RPE is located at: 18. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 47. The results from the analysis above can be found at: ChangepointsRPE.RData.

P11

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 11 is equal to: 3. These changepoints are located at: 0, 18, 96. The threshold based changepoint in the subjective RPE is located at: 100. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. The results from the analysis above can be found at: ChangepointsRPE.RData.

P12

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 12 is equal to: 2. These changepoints are located at: 54, 66. The threshold based changepoint in the subjective RPE is located at: 82. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. The results from the analysis above can be found at: ChangepointsRPE.RData.

P13

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 13 is equal to: 1. These changepoints are located at: 6. The threshold based changepoint in the subjective RPE is located at: 59. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 41, 65, 88. The results from the analysis above can be found at: ChangepointsRPE.RData.

P14

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 14 is equal to: 4. These changepoints are located at: 0, 24, 42, 96. The threshold based changepoint in the subjective RPE is located at: 41. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 41, 65, 88. The results from the analysis above can be found at: ChangepointsRPE.RData.

P15

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 3.1.1. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the the median filtered data for participant 15 is equal to: 2. These changepoints are located at: 6, 30. The threshold based changepoint in the subjective RPE is located at: 29. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. The results from the analysis above can be found at: ChangepointsRPE.RData.

save(changepoints_subj_ecp, changepoints_subj_thr,
     file = "./Data/RGenerated/ChangepointsRPE.RData")

4.2 Plots with the CUSUM of the Median Filtered Data and the ECP-CUSUM Changepoints

In this section, we perform the following tasks:

  1. EDA, where we overlay the subjective ratings with the CUSUM changepoints;

  2. A threshold based comparison, where we use \(\small RPE \ge 13\) to identify when a participant becomes fatigued and we compare that to the changepoints detected by the hybrid CUSUM-EPC approach; and

  3. Utilize the e.divisive() from the ecp package to obtain the changepoints for the Borg Ratings. Note that this function is the univariate approach for the multiple changepoint method examined in Section 3.1.

RatingThresh <- 13  # based on Maman et al. 2017 
RPE = read.csv("https://raw.githubusercontent.com/jqtcase/fatigue-changepoint/master/Data/Raw/RPE.csv") # to read subjective rating of RPE data from GitHub'
colnames(RPE)[1] <- "percent.time.from.start"
df_RPE_transformed <- melt(RPE,
                           id.vars = "percent.time.from.start",
                           measure.vars = colnames(RPE)[2:16])
df_RPE_transformed[,2] <- paste0(df_RPE_transformed[,2],".RPE")
# to load previously generated data and changepoints 
load(file = "./Data/RGenerated/NormCusumData.RData")
load(file = "./Data/RGenerated/ECPChangePointsCUSUM.RData")
changepoints_subj_ecp <- list()
changepoints_subj_thr <- list()

for (i in 1:15) {
  correct_subject_rows <- which(df_RPE_transformed[,2]==paste0("Subject.",i,".RPE"))
  dataholder <- df_RPE_transformed[correct_subject_rows,]
  
  
  # -------- Calculation of Fatigue based on RPE>= 13 -----------------
  fatigue_from_RPE_threshold <- which(RPE[,i+1]>=RatingThresh) %>% 
                              min() # returns window number
  changepoints_subj_thr[[i]] <- RPE[fatigue_from_RPE_threshold,1]
  df_RPE<-data.frame(d1=RPE[1],t1=RPE[i+1]) %>% na.omit() # to make dataframe of subjective ratings for ggplot
  
  #------- Estimation of Fatigue Changepoints based on ECP of the RPE Data --
  y <- e.divisive(df_RPE,min.size = 2)$estimates
  fatigue_from_ecp_changepoint = y[2:(length(y)-1)] # removing the first and last points as trivials changepoints
  changepoints_subj_ecp[[i]] <- df_RPE[fatigue_from_ecp_changepoint,1]
  
  cat('###',paste0("P", i), "{-}",'\n')
  df_transformed <- melt(get(paste0("subject",i,"_cusum_norm")),
                       id.vars = "percent.time.from.start",
                       measure.vars=c("scaled.stride.len",
                                      "scaled.stride.height",
                                      "stride.duration"
                                      )
                         )
  df_transformed <- rbind(df_transformed,df_RPE_transformed[correct_subject_rows,])
  
  assign(paste0("g",i),
        ggplot(data = df_transformed,
               aes(x=percent.time.from.start, y=value, colour = variable)) + 
          geom_line(size=1) +
          theme_bw() +
          scale_colour_manual(values = c("red", "green", "blue","black","grey40")) +
          facet_wrap(~variable,nrow=5, scales = "free_y") +
          # labs(title = paste("A paneled time-series plot for Participant",i)) + 
          geom_vline(xintercept= changepoints_cusum[[i]], 
                     show.legend = T, size=2) + 
          geom_vline(xintercept= changepoints_subj_thr[[i]],
                     linetype="solid", color="steelblue1",
                     show.legend = T, size=2) +
          geom_vline(xintercept= changepoints_subj_ecp[[i]],
                     linetype="dotted", color="gray80",
                     show.legend =T, size=2) +
          theme(legend.position="none",
                 plot.title = element_text(hjust = 0.5))
      )
      
      print(get(paste0("g",i)))
  
  cat('\n \n')
  
  cat(paste0('<source> <p> In the above figure, the colored vertical lines correspond to the following.  First, the <b> black  lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. </b> These changepoints are what we obtained in Section 3.1.2. Second, the <font color= #63B8FF> <b>steelblue line  correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. </b> </font> Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the <b> <font color=#CCCCCC> gray lines  denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. </font> </b>',
'. </font> </b> \n \n'))
  
  cat(paste0('<source> <p> Based on the <b>ecp package, the number of changepoints for the cusums of the median filtered data for participant ', i,
             ' is equal to: ',
             length(changepoints_cusum[[i]]),
             '. These changepoints are located at: ',
             paste(round(changepoints_cusum[[i]]), collapse = ", "),
             '.  <font color= #63B8FF> The threshold based changepoint in the subjective RPE is located at: ',
             paste(changepoints_subj_thr[[i]]),
             '. </font> </b> In addition, <b> <font color=#CCCCCC> the number of changepoints based on the RPE values for this subject is equal to: ',
             length(changepoints_subj_ecp[[i]]),
             ". These changepoints are located at: ",
             paste(changepoints_subj_ecp[[i]], collapse = ", "),
             '. </b> </font> Note that the results from the analysis above can be found at: <a href="https://github.com/jqtcase/fatigue-changepoint/blob/master/Data/RGenerated/ChangepointsRPE.RData">ChangepointsRPE.RData</a>. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]). </p> </source>')
      )
             
  
  cat('\n \n')
  
  }

P1

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 1 is equal to: 2. These changepoints are located at: 24, 72. The threshold based changepoint in the subjective RPE is located at: 59. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P2

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 2 is equal to: 2. These changepoints are located at: 42, 78. The threshold based changepoint in the subjective RPE is located at: 53. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 47. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P3

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 3 is equal to: 4. These changepoints are located at: 12, 36, 72, 84. The threshold based changepoint in the subjective RPE is located at: 35. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P4

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 4 is equal to: 5. These changepoints are located at: 0, 6, 54, 66, 96. The threshold based changepoint in the subjective RPE is located at: 23. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 41, 65, 88. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P5

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 5 is equal to: 2. These changepoints are located at: 6, 48. The threshold based changepoint in the subjective RPE is located at: 23. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P6

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 6 is equal to: 7. These changepoints are located at: 0, 24, 30, 48, 60, 84, 96. The threshold based changepoint in the subjective RPE is located at: 76. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P7

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 7 is equal to: 4. These changepoints are located at: 0, 24, 72, 96. The threshold based changepoint in the subjective RPE is located at: 18. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P8

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 8 is equal to: 3. These changepoints are located at: 0, 42, 96. The threshold based changepoint in the subjective RPE is located at: 59. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 41. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P9

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 9 is equal to: 3. These changepoints are located at: 36, 60, 90. The threshold based changepoint in the subjective RPE is located at: 23. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 47. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P10

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 10 is equal to: 5. These changepoints are located at: 0, 6, 42, 66, 96. The threshold based changepoint in the subjective RPE is located at: 18. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P11

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 11 is equal to: 2. These changepoints are located at: 6, 84. The threshold based changepoint in the subjective RPE is located at: 100. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P12

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 12 is equal to: 2. These changepoints are located at: 12, 60. The threshold based changepoint in the subjective RPE is located at: 82. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 47. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P13

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 13 is equal to: 2. These changepoints are located at: 6, 66. The threshold based changepoint in the subjective RPE is located at: 59. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 41. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P14

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 14 is equal to: 3. These changepoints are located at: 6, 30, 54. The threshold based changepoint in the subjective RPE is located at: 41. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 41, 65, 88. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P15

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 3.1.2. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. .

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 15 is equal to: 2. These changepoints are located at: 24, 72. The threshold based changepoint in the subjective RPE is located at: 29. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

save(changepoints_subj_thr,changepoints_subj_ecp,
       file = "./Data/RGenerated/ChangepointsRPE.RData")

5 Time-Series Clustering

To perform time-series clustering, we can do the following:

  1. univariate time-series clustering:
    • using the first PC of the scaled and then median filtered data; or
    • using the first PC of the scaled, median filtered and cusumed data;
  2. multivariate time-series clustering:
    • using the median filtered scaled data;
    • using the scaled, median filtered and then cusumed data.

We show the results from univariate time-series clustering using the first PC of the scaled median filtered data in the following subsection.

5.1 Univariate TimeSeries Clustering based on the First PC of the Scaled Sensor Data

load(file = "./Data/RGenerated/NormMedianFilteredData.RData")
load(file = "./Data/RGenerated/NormCusumData.RData")
# medianfilt_norm, cusum_norm

pca_subjects_loadings_scaled <- list()
pca_subjects_scaled <- data.frame(matrix(nrow=2000, ncol=15))
pc_prop_var_explained_scaled <- vector()

for (i in 1:15) {
  df <- get(paste0("subject",i,"_medianf_norm"))
  pca_dataholder <-  prcomp(df[,2:4], center=TRUE, scale. = T) # Scale <- Standardize Vars
  pca_subjects_loadings_scaled[[i]] <- pca_dataholder$rotation[,1] # Weights for First PC
  # Understanding the % Variation Explained by the First PC
  pc_std <- pca_dataholder$sdev
  pc_var <- pc_std^2
  pc_prop_var_explained_scaled[i] <- round(pc_var[1]/sum(pc_var),3)
  #  Storing the Data for First PC rotated values
  pca_subjects_scaled[,i] <- pca_dataholder$x[,1] # First PC
}

distances <- diss(pca_subjects_scaled, "DTWARP")
dimnames(distances) <- paste0("P",seq(1,15,1))

cat(paste0('<source> <p> Based on the above analysis, the % variation explained by the first principal component of the scaled median filtered data for each participant is equal to: ', paste(round(pc_prop_var_explained_scaled, 2), collapse = ", "), '. To understand how the underlying multivariate time series methods are similar, we applied a heirarchical clustering algorithm based on the distances obtained using the <i> Dynamic Time Warping Method </i>. The resulting dendogram is shown in the figure below. </p> </source>'))

Based on the above analysis, the % variation explained by the first principal component of the scaled median filtered data for each participant is equal to: 0.75, 0.8, 0.56, 0.88, 0.54, 0.68, 0.4, 0.74, 0.5, 0.78, 0.83, 0.54, 0.48, 0.76, 0.73. To understand how the underlying multivariate time series methods are similar, we applied a heirarchical clustering algorithm based on the distances obtained using the Dynamic Time Warping Method . The resulting dendogram is shown in the figure below.

# distances.matrix <- as.matrix(distances)
# heatmap(distances.matrix)


clusters <- hclust(distances)

plot(clusters, hang=0.1, check = TRUE, 
     axes = TRUE, ann = TRUE, ylab="Height",
     main = "Cluster Dendogram based on the DTWARP distances")

cat(paste0('Based on the above dendogram, one can see that the best choice for total number of clusters is equal to: 3 or 4. This number was then used to cut of the dendogram tree using the  cutree function in R. We show the resulting cluster membership in the table below. The associated cluster assignments for both thresholds are as follows: \n', 'When using k=3, the assignments are as follows: \n'))

Based on the above dendogram, one can see that the best choice for total number of clusters is equal to: 3 or 4. This number was then used to cut of the dendogram tree using the cutree function in R. We show the resulting cluster membership in the table below. The associated cluster assignments for both thresholds are as follows: When using k=3, the assignments are as follows:

clusterCut3 <- cutree(clusters, 3)
clusterCut4 <- cutree(clusters, 4)

save(clusterCut3, clusterCut4, pc_prop_var_explained_scaled,
     file = "./Data/RGenerated/ScaledPcaClusters.RData")

#kable(t(clusterCut4)) %>% column_spec(1, bold = T, width = "5em") %>% kable_styling(bootstrap_options = "striped", full_width = F)

kable(t(clusterCut3), row.names = NA, caption = "Cluster Assignment for each Participant") %>% column_spec(1, bold = T, width = "5em") %>% kable_styling(bootstrap_options = "striped", full_width = F)
Cluster Assignment for each Participant
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
1 1 2 1 1 1 3 3 3 1 1 3 3 3 1

5.2 Presenting TimeSeries Clustering Results with Demographic Information

We provide the overall result of clustering by overlaying the graphs of features and subjective rating of RPE for each cluster, i.e. C1 and C3. The graphical results are provided in the following section in terms of percent task completion (0-100%) for each subject.

load(file = "./Data/RGenerated/NormMedianFilteredData.RData")
load(file = "./Data/RGenerated/ScaledPcaClusters.RData")

RPE = read.csv("https://raw.githubusercontent.com/jqtcase/fatigue-changepoint/master/Data/Raw/RPE.csv") # to read subjective rating of RPE data from GitHub'
colnames(RPE)[1] <- "percent.time.from.start"
df_RPE_transformed <- melt(RPE,
                           id.vars = "percent.time.from.start",
                           measure.vars = colnames(RPE)[2:16])
df_RPE_transformed[,2] <- paste0("RPE.",df_RPE_transformed[,2])
df_RPE_transformed$newcolumn <- "RPE"
colnames(df_RPE_transformed)[4] <- "feature"

clust_1_data <- df_RPE_transformed[FALSE,]
clust_2_data <- df_RPE_transformed[FALSE,]
clust_3_data <- df_RPE_transformed[FALSE,]

for (i in 1:15) {
  correct_subject_rows <- which(df_RPE_transformed[,2]==paste0("RPE.Subject.",i))

  dataholder <- get(paste0("subject",i,"_medianf_norm"))
  colnames(dataholder) <- c("percent.time.from.start",
                              paste0("scaled.stride.len.",i),
                              paste0("scaled.stride.height.",i),
                              paste0("stride.duration.",i))
                            
  df_transformed <- melt(dataholder,
                     id.vars = "percent.time.from.start",
                       measure.vars=c(paste0("scaled.stride.len.",i),
                                      paste0("scaled.stride.height.",i),
                                      paste0("stride.duration.",i)
                                      )
                       )
  df_transformed$newcolumn <- c(replicate(2000, "scaled.stride.len"),
                                replicate(2000, "scaled.stride.height"),
                                replicate(2000, "stride.duration"))
  colnames(df_transformed)[4] <- "feature"
  df_transformed <- rbind(df_transformed,df_RPE_transformed[correct_subject_rows,])

  
  if (clusterCut3[i]==1) {
    clust_1_data <- rbind(clust_1_data,df_transformed)
  } else if (clusterCut3[i]==2) {
    clust_2_data <- rbind(clust_2_data,df_transformed)
  } else if (clusterCut3[i]==3) {
    clust_3_data <- rbind(clust_3_data,df_transformed)
  }
  
}


#########################################################################

demograph = read.csv("https://raw.githubusercontent.com/jqtcase/fatigue-changepoint/master/Data/Raw/DemographicData.csv") # to read demographic data from GitHub'


clust_1_table <- demograph[FALSE,]
clust_2_table <- demograph[FALSE,]
clust_3_table <- demograph[FALSE,]

for (i in 1:15) {
 if (clusterCut3[i]==1) {
    clust_1_table <- rbind(clust_1_table,demograph[i,])
  } else if (clusterCut3[i]==2) {
    clust_2_table <- rbind(clust_2_table,demograph[i,])
  } else if (clusterCut3[i]==3) {
    clust_3_table <- rbind(clust_3_table,demograph[i,])
  }
}

clust_1_means <- colMeans(x=clust_1_table[,2:7], na.rm = TRUE)
clust_1_sd <- apply(clust_1_table[,2:7], 2, sd, na.rm = TRUE)
clust_1_summary <- rbind(clust_1_means,clust_1_sd)

clust_2_means <- colMeans(x=clust_2_table[,2:7], na.rm = TRUE)
clust_2_sd <- apply(clust_2_table[,2:7], 2, sd, na.rm = TRUE)
clust_2_summary <- rbind(clust_2_means,clust_2_sd)

clust_3_means <- colMeans(x=clust_3_table[,2:7], na.rm = TRUE)
clust_3_sd <- apply(clust_3_table[,2:7], 2, sd, na.rm = TRUE)
clust_3_summary <- rbind(clust_3_means,clust_3_sd)

#########################################################################
 

cat('###',paste0("C", 1), "{-}",'\n')

C1

assign(paste0("g",1),
    ggplot(data = clust_1_data,
           aes(x=percent.time.from.start, y=value, colour = variable)) + 
      geom_line(size=1) +
      theme_bw() +
      facet_wrap(~feature,nrow=4, scales = "free_y") +
      labs(title = paste("A paneled time-series plot for cluster",1)) + 
      theme(legend.position="none",
             plot.title = element_text(hjust = 0.5))
  )
  
  print(get(paste0("g",1)))

kable(data.matrix(clust_1_summary) , format = "html") %>% column_spec(1, bold = T, width = "5em") %>% kable_styling(bootstrap_options = "striped", full_width = F)
Gender Age Weight Height Borg Fatigue
clust_1_means 0.3750000 39.12500 73.38750 171.750000 14.87500 7.2500000
clust_1_sd 0.5175492 17.31587 18.54581 9.452891 1.95941 0.8864053
cat('###',paste0("C", 2), "{-}",'\n')

C2

assign(paste0("g",2),
    ggplot(data = clust_2_data,
           aes(x=percent.time.from.start, y=value, colour = variable)) + 
      geom_line(size=1) +
      theme_bw() +
      facet_wrap(~feature,nrow=4, scales = "free_y") +
      labs(title = paste("A paneled time-series plot for cluster",2)) + 
      theme(legend.position="none",
             plot.title = element_text(hjust = 0.5))
  )
  
  print(get(paste0("g",2)))

kable(data.matrix(clust_2_summary) , format = "html") %>% column_spec(1, bold = T, width = "5em") %>% kable_styling(bootstrap_options = "striped", full_width = F)
Gender Age Weight Height Borg Fatigue
clust_2_means 0 23 69.3 171 15 8
clust_2_sd NA NA NA NA NA NA
cat('###',paste0("C", 3), "{-}",'\n')

C3

assign(paste0("g",3),
    ggplot(data = clust_3_data,
           aes(x=percent.time.from.start, y=value, colour = variable)) + 
      geom_line(size=1) +
      theme_bw() +
      facet_wrap(~feature,nrow=4, scales = "free_y") +
      labs(title = paste("A paneled time-series plot for cluster",3)) + 
      theme(legend.position="none",
             plot.title = element_text(hjust = 0.5))
  )
  
  print(get(paste0("g",3)))

kable(data.matrix(clust_3_summary) , format = "html") %>% column_spec(1, bold = T, width = "5em") %>% kable_styling(bootstrap_options = "striped", full_width = F)
Gender Age Weight Height Borg Fatigue
clust_3_means 0.6666667 38.50000 76.20000 169.50000 14.833333 6.833333
clust_3_sd 0.5163978 19.21198 11.26535 11.16692 2.136976 1.329160

5.3 Interpretation of the Clustering Results Compared to the Subjects’ Demographic/Anthropometric/Subjective Information

The overall results show a tendency for a gender dependency in the grouping scheme (gender index for C1 = 0.37 and for C3 = 0.57) indicating a convergence in the stride features toward the second half of the experiment for C1. This condition is applicable for the RPE in C2. A multinomial logistic regression was performed on each of the measures in the Figures in Section 5.2 as well as the whole set to examine the predictability of the cluster membership from the demographic/anthropometric/subjective measures, however, none of the parameters were significant on the relative risk of being in C1 compared to C3, i.e, p-value > 0.05. We are hoping to develop a general diagnosis/understanding of how fatigue develops (e.g., we start with a ramp up phase, then steady state, then fatigue starting, and then person is totally fatigued).